home *** CD-ROM | disk | FTP | other *** search
-
-
-
- IBM Personal Computer Assembly
- Language Tutorial
-
- Joshua Auerbach
- Yale University
- Yale Computer Center
- 175 Whitney Avenue
- P. O. Box 2112
- New Haven, Connecticut 06520
- Installation Code YU
- Integrated Personal Computers Project
- Communications Group
- Communications and Data Base Division
- Session C316
-
-
- This talk is for people who are just getting started with the PC MACRO
- Assembler. Maybe you are just contemplating doing some coding in
- assembler, maybe you have tried it with mixed success. If you are here to
- get aimed in the right direction, to get off to a good start with the
- assembler, then you have come for the right reason. I can't promise you'll
- get what you want, but I'll do my best.
- On the other hand, if you have already turned out some working assembler
- code, then this talk is likely to be on the elementary side for you. If
- you want to review a few basics and have no where else pressing to go, then
- by all means stay.
-
- Why Learn Assembler?
- ____________________
- Why Learn Assembler?
- Why Learn Assembler?
- Why Learn Assembler?
- The reasons for LEARNING assembler are not the same as the reasons for
- USING it in a particular application. But, we have to start with some of
- the reasons for using it and then I think the reasons for learning it will
- become clear.
- First, let's dispose of a bad reason for using it. Don't use it just
- because you think it is going to execute faster. A particular sequence of
- ordinary bread-and-butter computations written in PASCAL, C, FORTRAN, or
- compiled BASIC can do the job just about as fast as the same algorithm
- coded in assembler. Of course, interpretive BASIC is slower, but if you
- have a BASIC application which runs too slow you probably want to try com-
- IBM PC Assembly Language Tutorial 1
-
-
- piling it before you think too much about translating parts of it to
- another language.
- On the other hand, high level languages do tend to isolate you from the
- machine. That is both their strength and their weakness. Usually, when
- implemented on a micro, a high level language provides an escape mechanism
- to the underlying operating system or to the bare machine. So, for
- example, BASIC has its PEEK and POKE. But, the route to the bare machine
- is often a circuitous one, leading to tricky programming which is hard to
- follow.
- For those of us working on PC's connected to SHARE-class mainframes, we are
- generally concerned with three interfaces: the keyboard, the screen, and
- the communication line or lines. All three of these entities raise machine
- dependent issues which are imperfectly addressed by the underlying operat-
- ing system or by high level languages.
- Sometimes, the system or the language does too little for you. For
- example, with the asynch adapter, the system provides no interrupt handler,
- no buffer, and no flow control. The application is stuck with the respon-
- sibility for monitoring that port and not missing any characters, then
- deciding what to do with all errors. BASIC does a reasonable job on some
- of this, but that is only BASIC. Most other languages do less.
- Sometimes, the system may do too much for you. System support for the key-
- board is an example. At the hardware level, all 83 keys on the keyboard
- send unique codes when they are pressed, held down, and released. But,
- someone has decided that certain keys, like Num Lock and Scroll Lock are
- going to do certain things before the application even sees them and can't
- therefore be used as ordinary keys.
- Sometimes, the system does about the right amount of stuff but does it less
- efficiently then it should. System support for the screen is in this
- class. If you use only the official interface to the screen you sometimes
- slow your application down unacceptably. I said before, don't use assem-
- bler just to speed things up, but there I was talking about mainline code,
- which generally can't be speeded up much by assembler coding. A critical
- system interface is a different matter: sometimes we may have to use
- assembler to bypass a hopelessly inefficient implementation. We don't want
- to do this if we can avoid it, but sometimes we can't.
- Assembly language code can overcome these deficiencies. In some cases, you
- can also overcome these deficiencies by judicious use of the escape valves
- which your high level language provides. In BASIC, you can PEEK and POKE
- and INP and OUT your way around a great many issues. In many other lan-
- guages you can issue system calls and interrupts and usually manage, one
- way or other, to modify system memory. Writing handlers to take real-time
- hardware interrupts from the keyboard or asynch port, though, is still
- going to be a problem in most languages. Some languages claim to let you
- do it but I have yet to see an acceptably clean implementation done that
- way.
- The real reason while assembler is better than "tricky POKEs" for writing
- machine-dependent code, though, is the same reason why PASCAL is better
- than assembler for writing a payroll package: it is easier to maintain.
- IBM PC Assembly Language Tutorial 2
-
-
- Let the high level language do what it does best, but recognize that there
- are some things which are best done in assembler code. The assembler,
- unlike the tricky POKE, can make judicious use of equates, macros, labels,
- and appropriately placed comments to show what is really going on in this
- machine-dependent realm where it thrives.
- So, there are times when it becomes appropriate to write in assembler; giv-
- en that, if you are a responsible programmer or manager, you will want to
- be "assembler-literate" so you can decide when assembler code should be
- written.
- What do I mean by "assembler-literate?" I don't just mean understanding
- the 8086 architecture; I think, even if you don't write much assembler code
- yourself, you ought to understand the actual process of turning out assem-
- bler code and the various ways to incorporate it into an application. You
- ought to be able to tell good assembler code from bad, and appropriate
- assembler code from inappropriate.
-
- Steps to becoming ASSEMBLER-LITERATE
- ____________________________________
- Steps to becoming ASSEMBLER-LITERATE
- Steps to becoming ASSEMBLER-LITERATE
- Steps to becoming ASSEMBLER-LITERATE
- 1. Learn the 8086 architecture and most of the instruction set. Learn
- what you need to know and ignore what you don't. Reading: The 8086
- Primer by Stephen Morse, published by Hayden. You need to read only
- two chapters, the one on machine organization and the one on the
- instruction set.
- 2. Learn about a few simple DOS function calls. Know what services the
- operating system provides. If appropriate, learn a little about other
- systems too. It will aid portability later on. Reading: appendices D
- and E of the PC DOS manual.
- 3. Learn enough about the MACRO assembler and the LINKer to write some
- simple things that really work. Here, too, the main thing is figuring
- out what you don't need to know. Whatever you do, don't study the sam-
- ple programs distributed with the assembler unless you have nothing
- better!
- 4. At the same time as you are learning the assembler itself, you will
- need to learn a few tools and concepts to properly combine your assem-
- bler code with the other things you do. If you plan to call assembler
- subroutines from a high level language, you will need to study the
- interface notes provided in your language manual. Usually, this forms
- an appendix of some sort. If you plan to package your assembler rou-
- tines as .COM programs you will need to learn to do this. You should
- also learn to use DEBUG.
- 5. Read the Technical Reference, but very selectively. The most important
- things to know are the header comments in the BIOS listing. Next, you
- will want to learn about the RS 232 port and maybe about the video
- adapters.
-
- IBM PC Assembly Language Tutorial 3
-
-
- Notice that the key thing in all five phases is being selective. It is
- easy to conclude that there is too much to learn unless you can throw away
- what you don't need. Most of the rest of this talk is going to deal with
- this very important question of what you need and don't need to learn in
- each phase. In some cases, I will have to leave you to do almost all of
- the learning, in others, I will teach a few salient points, enough, I hope,
- to get you started. I hope you understand that all I can do in an hour is
- get you started on the way.
-
- Phase 1: Learn the architecture and instruction set
- ____________________________________________________
- Phase 1: Learn the architecture and instruction set
- Phase 1: Learn the architecture and instruction set
- Phase 1: Learn the architecture and instruction set
- The Morse book might seem like a lot of book to buy for just two really
- important chapters; other books devote a lot more space to the instruction
- set and give you a big beautiful reference page on each instruction. And,
- some of the other things in the Morse book, although interesting, really
- aren't very vital and are covered too sketchily to be of any real help.
- The reason I like the Morse book is that you can just read it; it has a
- very conversational style, it is very lucid, it tells you what you really
- need to know, and a little bit more which is by way of background; because
- nothing really gets belabored to much, you can gracefully forget the things
- you don't use. And, I very much recommend READING Morse rather than study-
- ing it. Get the big picture at this point.
- Now, you want to concentrate on those things which are worth fixing in mem-
- ory. After you read Morse, you should relate what you have learned to this
- outline.
- 1. You want to fix in your mind the idea of the four segment registers
- CODE, DATA, STACK, and EXTRA. This part is pretty easy to grasp. The
- 8086 and the 8088 use 20 bit addresses for memory, meaning that they
- can address up to 1 megabyte of memory. But, the registers and the
- address fields in all the instructions are no more that 16 bits long.
- So, how to address all of that memory? Their solution is to put
- together two 16 bit quantities like this:
- calculation SSSS0 ---- value in the relevant segment register SHL 4
- depicted in AAAA ---- apparent address from register or instruction
- hexadecimal --------
- RRRRR ---- real address placed on address bus
- In other words, any time memory is accessed, your program will supply a
- sixteen bit address. Another sixteen bit address is acquired from a
- segment register, left shifted four bits (one nibble) and added to it
- to form the real address. You can control the values in the segment
- registers and thus access any part of memory you want. But the segment
- registers are specialized: one for code, one for most data accesses,
- one for the stack (which we'll mention again) and one "extra" one for
- additional data accesses.
- Most people, when they first learn about this addressing scheme become
- obsessed with converting everything to real 20 bit addresses. After a
- while, though, you get use to thinking in segment/offset form. You
- IBM PC Assembly Language Tutorial 4
-
-
- tend to get your segment registers set up at the beginning of the pro-
- gram, change them as little as possible, and think just in terms of
- symbolic locations in your program, as with any assembly language.
- EXAMPLE:
- MOV AX,DATASEG
- MOV DS,AX ;Set value of Data segment
- ASSUME DS:DATASEG ;Tell assembler DS is usable
- .......
- MOV AX,PLACE ;Access storage symbolically by 16 bit address
- In the above example, the assembler knows that no special issues are
- involved because the machine generally uses the DS register to complete
- a normal data reference.
- If you had used ES instead of DS in the above example, the assembler
- would have known what to do, also. In front of the MOV instruction
- which accessed the location PLACE, it would have placed the ES segment
- prefix. This would tell the machine that ES should be used, instead of
- DS, to complete the address.
- Some conventions make it especially easy to forget about segment regis-
- ters. For example, any program of the COM type gets control with all
- four segment registers containing the same value. This program exe-
- cutes in a simplified 64K address space. You can go outside this
- address space if you want but you don't have to.
- 2. You will want to learn what other registers are available and learn
- their personalities:
- AX and DX are general purpose registers. They become special only
- when accessing machine and system interfaces.
- CX is a general purpose register which is slightly specialized for
- counting.
- BX is a general purpose register which is slightly specialized for
- forming base-displacement addresses.
- AX-DX can be divided in half, forming AH, AL, BH, BL, CH, CL, DH,
- DL.
- SI and DI are strictly 16 bit. They can be used to form indexed
- addresses (like BX) and they are also used to point to strings.
- SP is hardly ever manipulated. It is there to provide a stack.
- BP is a manipulable cousin to SP. Use it to access data which has
- been pushed onto the stack.
- Most sixteen bit operations are legal (even if unusual) when per-
- formed in SI, DI, SP, or BP.
-
-
- IBM PC Assembly Language Tutorial 5
-
-
- 3. You will want to learn the classifications of operations available
- WITHOUT getting hung up in the details of how 8086 opcodes are con-
- structed.
- 8086 opcodes are complex. Fortunately, the assembler opcodes used to
- assemble them are simple. When you read a book like Morse, you will
- learn some things which are worth knowing but NOT worth dwelling on.
- a. 8086 and 8088 instructions can be broken up into subfields and bits
- with names like R/M, MOD, S and W. These parts of the instruction
- modify the basic operation in such ways as whether it is 8 bit or
- 16 bit, if 16 bit, whether all 16 bits of the data are given,
- whether the instruction is register to register, register to
- memory, or memory to register, for operands which are registers,
- which register, for operands which are memory, what base and index
- registers should be used in finding the data.
- b. Also, some instructions are actually represented by several differ-
- ent machine opcodes depending on whether they deal with immediate
- data or not, or on other issues, and there are some expedited forms
- which assume that one of the arguments is the most commonly used
- operand, like AX in the case of arithmetic.
- There is no point in memorizing any of this detail; just distill the
- bottom line, which is, what kinds of operand combinations EXIST in the
- instruction set and what kinds don't. If you ask the assembler to ADD
- two things and the two things are things for which there is a legal ADD
- instruction somewhere in the instruction set, the assembler will find
- the right instruction and fill in all the modifier fields for you.
- I guess if you memorized all the opcode construction rules you might
- have a crack at being able to disassemble hex dumps by eye, like you
- may have learned to do somewhat with 370 assembler. I submit to you
- that this feat, if ever mastered by anyone, would be in the same class
- as playing the "Minute Waltz" in a minute; a curiosity only.
- Here is the basic matrix you should remember:
-
-
-
-
-
-
-
- IBM PC Assembly Language Tutorial 6
-
-
- Two operands: One operand:
- R <-- M R
- M <-- R M
- R <-- R S *
- R|M <-- I
- R|M <-- S *
- S <-- R|M *
- * -- data moving instructions (MOV, PUSH, POP) only
- S -- segment register (CS, DS, ES, SS)
- R -- ordinary register (AX, BX, CX, DX, SI, DI, BP, SP,
- AH, AL, BH, BL, CH, CL, DH, DL)
- M -- one of the following
- pure address
- [BX]+offset
- [BP]+offset
- any of the above indexed by SI
- any of the first three indexed by DI
- 4. Of course, you want to learn the operations themselves. As I've sug-
- gested, you want to learn the op codes as the assembler presents them,
- not as the CPU machine language presents them. So, even though there
- are many MOV op codes you don't need to learn them. Basically, here is
- the instruction set:
- a. Ordinary two operand instructions. These instructions perform an
- operation and leave the result in place of one of the operands.
- They are
- 1) ADD and ADC -- addition, with or without including a carry from
- a previous addition
- 2) SUB and SBB -- subtraction, with or without including a borrow
- from a previous subtraction
- 3) CMP -- compare. It is useful to think of this as a subtraction
- with the answer being thrown away and neither operand actually
- changed
- 4) AND, OR, XOR -- typical boolean operations
- 5) TEST -- like an AND, except the answer is thrown away and nei-
- ther operand is changed.
- 6) MOV -- move data from source to target
- 7) LDS, LES, LEA -- some specialized forms of MOV with side
- effects
- b. Ordinary one operand instructions. These can take any of the oper-
- and forms described above. Usually, the perform the operation and
- leave the result in the stated place:
- 1) INC -- increment contents
-
- IBM PC Assembly Language Tutorial 7
-
-
- 2) DEC -- decrement contents
- 3) NEG -- twos complement
- 4) NOT -- ones complement
- 5) PUSH -- value goes on stack (operand location itself unchanged)
- 6) POP -- value taken from stack, replaces current value
- c. Now you touch on some instructions which do not follow the general
- operand rules but which require the use of certain registers. The
- important ones are
- 1) The multiply and divide instructions
- 2) The "adjust" instructions which help in performing arithmetic
- on ASCII or packed decimal data
- 3) The shift and rotate instructions. These have a restriction on
- the second operand: it must either be the immediate value 1 or
- the contents of the CL register.
- 4) IN and OUT which send or receive data from one of the 1024
- hardware ports.
- 5) CBW and CWD -- convert byte to word or word to doubleword by
- sign extension
- d. Flow of control instructions. These deserve study in themselves
- and we will discuss them a little more. They include
- 1) CALL, RET -- call and return
- 2) INT, IRET -- interrupt and return-from-interrupt
- 3) JMP -- jump or "branch"
- 4) LOOP, LOOPNZ, LOOPZ -- special (and useful) instructions which
- implement a counted loop similar to the 370 BCT instruction
- 5) various conditional jump instructions
- e. String instructions. These implement a limited storage-to-storage
- instruction subset and are quite powerful. All of them have the
- property that
- 1) The source of data is described by the combination DS and SI.
- 2) The destination of data is described by the combination ES and
- DI.
- 3) As part of the operation, the SI and/or DI register(s) is(are)
- incremented or decremented so the operation can be repeated.
-
- IBM PC Assembly Language Tutorial 8
-
-
- They include
- 1) CMPSB/CMPSW -- compare byte or word
- 2) LODSB/LODSW -- load byte or word into AL or AX
- 3) STOSB/STOSW -- store byte or word from AL or AX
- 4) MOVSB/MOVSW -- move byte or word
- 5) SCASB/SCASW -- compare byte or word with contents of AL or AX
- 6) REP/REPE/REPNE -- a prefix which can be combined with any of
- the above instructions to make them execute repeatedly across a
- string of data whose length is held in CX.
- f. Flag instructions: CLI, STI, CLD, STD, CLC, STC. These can set or
- clear the interrupt (enabled) direction (for string operations) or
- carry flags.
- The addressing summary and the instruction summary given above masks a
- lot of annoying little exceptions. For example, you can't POP CS, and
- although the R <-- M form of LES is legal, the M <-- R form isn't etc.
- etc. My advice is
- a. Go for the general rules
- b. Don't try to memorize the exceptions
- c. Rely on common sense and the assembler to teach you about
- exceptions over time. A lot of the exceptions cover things you
- wouldn't want to do anyway.
- 5. A few instructions are rich enough and useful enough to warrent careful
- study. Here are a few final study guidelines:
- a. It is well worth the time learning to use the string instruction
- set effectively. Among the most useful are
- REP MOVSB ;moves a string
- REP STOSB ;initializes memory
- REPNE SCASB ;look up occurance of character in string
- REPE CMPSB ;compare two strings
- b. Similarly, if you have never written for a stack machine before,
- you will need to exercise PUSH and POP and get very comfortable
- with them because they are going to be good friends. If you are
- used to the 370, with lots of general purpose registers, you may
- find yourself feeling cramped at first, with many fewer registers
- and many instructions having register restrictions. But, you have
- a hidden ally: you need a register and you don't want to throw
- away what's in it? Just PUSH it, and when you are done, POP it
- back. This can lead to abuse. Never have more than two
- "expedient" PUSHes in effect and never leave something PUSHed
- across a major header comment or for more than 15 instructions or
- IBM PC Assembly Language Tutorial 9
-
-